Non-linear maximum likelihood feature transformation for speech recognition
نویسندگان
چکیده
Most automatic speech recognition (ASR) systems use Hidden Markov model (HMM) with a diagonal-covariance Gaussian mixture model for the state-conditional probability density function. The diagonal-covariance Gaussian mixture can model discrete sources of variability like speaker variations, gender variations, or local dialect, but can not model continuous types of variability that account for correlation between the elements of the feature vector. In this paper, we present a transformation of the acoustic feature vector that minimizes an empirical estimate of the relative entropy between the likelihood based on the diagonal-covariance Gaussian mixture HMM model and the true likelihood. Based on this formulation, we provide a solution to the problem using volume-preserving maps; existing linear feature transform designs are shown to be special cases of the proposed solution. Since most of the acoustic features used in ASR are not linear functions of the sources of correlation in the speech signal, we use a non-linear transformation of the features to minimize this objective function. We describe an iterative algorithm to estimate the parameters of both the volume-preserving feature transformation and the HMM that jointly optimize the objective function for an HMM-based speech recognizer. Using this algorithm, we achieved 2% improvement in phoneme recognition accuracy compared to the baseline system. Our approach shows also improvement in recognition accuracy compared to previous linear approaches like linear discriminant analysis (LDA), maximum likelihood linear transform (MLLT), and independent component analysis (ICA).
منابع مشابه
Generalized discriminative feature transformation for speech recognition
We propose a new algorithm called Generalized Discriminative Feature Transformation (GDFT) for acoustic models in speech recognition. GDFT is based on Lagrange relaxation on a transformed optimization problem. We show that the existing discriminative feature transformation methods like feature space MMI/MPE (fMMI/MPE), region dependent linear transformation (RDLT), and a non-discriminative feat...
متن کاملLarge vocabulary conversational speech recognition with the extended maximum likelihood linear transformation (EMLLT) model
This paper applies the recently proposed Extended Maximum Likelihood Linear Transformation (EMLLT) model in a Speaker Adaptive Training (SAT) context on the Switchboard database. Adaptation is carried out with maximum likelihood estimation of linear transforms for the means, precisions (inverse covariances) and the feature-space under the EMLLT model. This paper shows the first experimental evi...
متن کاملMaxium Likelihood Non-linear Transformation for Environment Adaptation in Speech Recognition Systems
In this paper, we describe an adaptation method for speech recognition systems that is based on a piecewise-linear approximation to a non-linear transformation of the feature space. The method extends a previously proposed non-linear transformation (NLT) technique by making the transformation function more sophisticated (piecewise-linear instead of piecewiseconstant), and by computing the trans...
متن کاملReview on Heteroscedastic Discriminant Analysis
Discriminant feature spaces are attractive way to improve the word error rate performance of the speech recognition systems. Heteroscedastic discriminant analysis (HDA) is a generalized method for the feature space transformation that does not impose the equa l w i th in c l a s s cova r i ance assumptions required by the standard linear discriminant analysis (LDA). It will be shown that the co...
متن کاملMaximum Likelihood Lineartransformations for Hmm
This paper examines the application of linear transformations for speaker and environmental adaptation in an HMM-based speech recognition system. In particular, transformations that are trained in a maximum likelihood sense on adaptation data are investigated. Other than in the form of a simple bias, strict linear feature-space transformations are inappropriate in this case. Hence, only model-b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003